Method for evaluating sound and system for carrying out the same

ABSTRACT

A system and method (and storage media) for identifying the category of the source noise and sound by using the physical factors derived from the autocorrelation function (ACF) and the interaural crosscorrelation function (IACF) which are ever changing in the time domain based on the model of human auditory brain function system. A method for evaluating a sound comprises the steps of: using the sound recorder to capture and record the acoustic signal; calculating the ACF from the acoustic signal using a CPU; calculating ACF factors extracted from the calculated ACF using the CPU; and identifying the kinds of the noise based on the ACF factors. Thereby the unknown noise source can be identified what it is (such as automobile noise, and factory noise), and can be identified its type such as type of cars or type of machines.

FIELD OF THE INVENTION

The present invention relates to a system and a method for evaluating noise and sound, and more particularly, relates to a system and a method for evaluating noise and sound based on an autocorrelation function (hereinafter called “ACF”) and an interaural crosscorrelation function (hereinafter referred to as “IACF”).

The present invention also relates to a system and a method for measuring and subjectively evaluating environmental noise such as automobile noise and aircraft noise, and more particularly, relates to a system and a method for measuring and subjectively evaluating noise on the basis of a binaural system.

BACKGROUND ART

Environmental noise such as atrafic noise and aircraft noise has been discussed previously with a sound pressure level and its frequency characteristic measured by using a noise level meter in monaural. However, it has been recognized that only the physical factors measured by the noise level meter in monaural mentioned above could not express sufficiently and appropriately the subjective response of the human beings. In the field of concert hall acoustics, it has been revealed that physical data of the hall in a binaural system has relevance to psychological (subjective) evaluation, whereas, in the field of the research of environmental noise, only physical data such as spectrum information in the monaural system has been dealt with.

Up to now, in the music field, in order to tune instruments as well as to evaluate tone, at first sound spectrum has been analyzed, and then ceptsrum analyzing has been effected.

For many years, the environmental noise has been evaluated in terms of the statistical sound pressure level (SPL), represented as L_(x) or L_(eq) and its power spectrum measured by a monaural sound level meter. The SPL and power spectrum alone, however, do not provide a description that matches subjective evaluations of the environmental noise.

Also, it is difficult to express appropriately psychological responses to sound by the conventional method for evaluating timble and for tuning tone.

It is an object of the present invention to provide a system, a method and a storage media for identifying a category of a noise source by using physical factors derived from an autocorrelation function ACF which always changes in the time domain as well as from an interaural crosscorrelation function IACF of a binaural signal on the basis of the human auditory-brain system.

It is another object of the present invention to provide a system, a method and a storage media for subjectively evaluating more precisely timbre, scale, loudness, pitch, tone color, perception of duration, subjective diffuseness, apparent source width for the sound field by using physical factors derived from ACF and IACF which are ever changing in the time domain based on the human auditory brain function system.

SUMMARY OF THE INVENTION

In order to attain the above mentioned objects, a method for evaluating noise and sound according to the present invention comprises the steps of:

capturing the sound and converting the captured sound into an acoustic signal;

calculating an autocorrelation function ACF by processing the acoustic signal with the aid of a computing means;

calculating at least one autocorrelation function factor (ACF factor) from the calculated ACF with the aid of the computing means; and

evaluating the sound in accordance with said at least one ACF factor and a preestablished database with the aid of the computing means, said database storing at least one of the following data; tone data associating tones with ACF factors, prosodic data associating prosodic elements with ACF factors and subjective evaluation data associating subjective evaluation values with ACF factors.

According to the invention, the sound can objectively be evaluated by reading data from said preestablished database (storing data that associate various kinds of information of various musical instruments which have been evaluated to generate good sounds (scale values of, for instance, tone color, prosodic element, timbre and subjective diffuseness, scale values of apparent source width ASW and subjective evaluation values) with ACF factors and IACF factors); and by comparing the readout data with the ACF factor extracted from the acoustic signal of the target sound to derive a difference therebetween, numerical value of the difference or a degree of the difference. When the sound evaluating method according to the present invention is applied to manufacturing of musical instruments, adjustment of tone color or timbre and tuning of tonal scale, it is objectively and appropriately possible to attain an instrument which would be evaluated subjectively to have good tone color. Also, an instrument can objectively and appropriately be tuned. In other words, according to the invention, instruments which have been manufactured in accordance with craftsman's intuition could be manufactured in accordance with the objective data.

In an embodiment of the sound evaluating method according to the invention, said step of calculating at least one ACF factor comprises calculating at least one of the following ACF factors: energy Φ(0) represented at the origin of the delay (i.e. delay time is zero); an effective duration τ_(e); a delay time of a maximum peak τ₁; an amplitude of the maximum peak of the normalized ACF φ₁; and information of respective peaks within the delay time (from zero to τ₁) τ_(n)′, φ_(n)′ (n=1, 2, 3, . . . , N (N is an integer less than approximately 10)).

According to this embodiment, a musical instrument having a subjective evaluation denoting better tone color can be more objectively and more appropriately manufactured based on the various ACF factors mentioned above and the instrument is also more objectively and more appropriately tuned.

Another embodiment of the sound evaluating method according to the invention further comprises the steps of:

calculating a pitch frequency based on the delay time τ₁ of the ACF; and

comparing the calculated pitch frequency with data of a predetermined tonal scale database to derive a difference therebetween to perform tuning.

According to this embodiment, under favor of the phenomenon such that an inverse number of the τ₁ calculated from the sound signal correlates the pitch frequency, the musical scale of the intended instrument can appropriately be tuned. In this connection, it is possible that the predetermined tonal scale database may be superseded by the said predetermined database.

Another embodiment of the sound evaluating method according to the invention further comprises the steps of:

capturing the sound in a binaural manner and converting the captured sound into an acoustic binaural signal;

calculating an interaural crosscorrelation function IACF between left and right channels from the acoustic binaural signal with the aid of computing means;

calculating at least one interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF with the aid of the computing means; and

evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said preestablished database with the aid of the computing means.

According this embodiment, the sound evaluation and the subjective sound evaluation can objectively and appropriately be accomplished by comparing evaluation values of spatial subjective sensations such as a subjective diffuseness extracted from the IACF with data stored read out of the database to derive differences between them.

The principal conception of the present invention may be realized not only as the method mentioned above but also as a system.

For instance, a sound evaluation system according to the invention comprises:

sound capturing means for capturing a sound and converting the captured sound into an acoustic signal;

ACF calculating means for calculating an autocorrelation function ACF from the acoustic signal;

ACF factor calculating means for calculating autocorrelation function factors from the calculated autocorrelation function ACF; and

evaluating means for evaluating the sound based on the ACF factors and a predetermined database storing at least one of the following data; tone data associating tones with ACF factors, prosodic data associating prosodic elements with ACF factors and subjective evaluation data associating subjective evaluation values with ACF factors.

In an embodiment of the sound evaluating system according to the invention, said ACF factor calculating means comprises calculating means for calculating at least one of the following ACF factors: energy Φ(0) represented at the origin of a delay (i.e. delay time is zero); an effective duration τ_(c); a delay time of a maximum peak τ₁; an amplitude of a maximum peak of a normalized ACF φ₁; and information of respective peaks within the delay times (from zero to τ₁) τ_(n)′, φ_(n)′ (n=1, 2, 3, . . . , N (N is an integer less than approximately 10)).

In still another embodiment of the sound evaluation system according to the invention, the system further comprises:

pitch frequency calculating means for calculating a pitch frequency from τ₁ of the ACF; and

tuning means for comparing the calculated pitch frequency with data read out of a predetermined tonal scale database to represent a difference between them.

In still another embodiment of the sound evaluating system according to the invention, the system further comprises:

capturing means for capturing the sound in a binaural manner and converting the captured sound into an acoustic binaural signal;

IACF calculating means for calculating an interaural crosscorrelation function IACF between right and left channels from the acoustic binaural signal;

IACF factor calculating means for calculating an interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF; and

evaluating means for evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said preestablished database.

According to further aspect of the present invention, a method for identifying a kind of a noise source comprises the steps of:

capturing and recording a sound signal from an environmental noise source to be identified using sound recording means;

calculating an autocorrelation function ACF from the recorded sound signal by Fourier transform with the aid of computing means;

calculating autocorrelation function factors from the calculated autocorrelation function ACF with the aid of the computing means; and

identifying a kind of the noise source based on the calculated autocorrelation function factors with the aid of the computing means.

In a preferable embodiment of the sound source identifying method according to the invention, said autocorrelation function factor calculating step comprises a step of calculating the following autocorrelation function factors: energy Φ(0) represented at the origin of delay (i.e. delay time is zero); effective duration τ_(e); delay time of a first peak τ₁; and amplitude of a first peak of a normalized ACF φ₁ from said autocorrelation function ACF; and said identifying a kind of a noise source comprises the steps of:

calculating logarithms of the energy Φ(0) represented at the origin of a delay (i.e. delay time is zero), an effective duration τ_(c), a delay time of a first peak τ₁, and an amplitude of a first peak of a normalized ACF φ₁ (the τ_(n)′ and φ_(n)′ may contingently be added) and deriving absolute values of differences (i.e. distances) between these logarithms and corresponding logarithms in templates previously made in accordance with the respective autocorrelation function factors of various noise sources;

deriving weighting coefficients for the respective autocorrelation function factors by dividing standard deviations (S₂) of arithmetic means of the respective autocorrelation function factors by an arithmetic mean (S₁) of standard deviations of all categories of the autocorrelation function factors to derive quotients and by calculating square roots of the quotients;

multiplying the respective distances with corresponding weighting coefficients for the respective autocorrelation function factors to find a total distance; and

comparing the thus obtained total distance with the distances in the stored templates to select one of the distances, which is proximate to the total distance.

According to another aspect of the invention, a method for evaluating subjectively a noise source comprises the steps of:

recording acoustic signals of an environmental noise in a binaural manner using sound recording means;

calculating an autocorrelation function and an interaural crosscorrelation function between right and left ear channels from the acoustic signals with aid of computing means;

calculating autocorrelation function factors from the ACF and/or calculating interaural crosscorrelation function factors from the IACF with the aid of the computing means; and

subjectively evaluating a noise source based on the autocorrelation function factors and/or the interaural crosscorrelation function factors with the aid of the computing means.

The present invention has been mainly described as methods, however it is understood that the present invention may be realized as systems corresponding to the methods, programs embodying the methods as well as a storage media storing the programs.

Description of many subjective attributes such as preference and diffuseness, as well as primary sensations (loudness, pitch, and timbre) can be based on a model of response of the human auditory-brain system to sound fields, and the predictions of the model have been found to be consistent with experimental results. The loudness of band-limited noise, for example, has recently been shown to be affected by the effective duration of the autocorrelation function (ACF), τ_(c), as well as by the SPL. When a fundamental frequency of complex tones is below about 1200 Hz, the pitch and its strength are influenced by a delay time of the first peak τ₁ and an amplitude of a first peak of the normalized ACF φ₁, respectively. In particular, the ACF factors obtained at (τ_(c))_(min) are good indicators of differences in the subjective evaluation of the noise source and the noise field.

The model comprises autocorrelators for the signals at two auditory pathways and an interaural crosscorrelator between these signals, and it takes into account of the specialization of the cerebral hemisphere in humans. The ACF and interaural crosscorrelation function (IACF) of sound signals arriving at both ears are calculated. Orthogonal factors Φ(0), τ₀, τ₁, and φ₁ are extracted from the ACF. The IACF factors LL (sound pressure level), IACC (peak magnitude), τ_(IACC) (delay time of the peck magnitude), and W_(IACC) (width of the peak magnitude) are extracted from the IACF.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an overview diagram illustrating an arrangement of an embodiment of the sound evaluating system according to the invention;

FIG. 2 is a flowchart showing the method for identifying and subjectively evaluating the noise source;

FIG. 3 is a schematic diagram depicting a procedure of the peak detection, an upper block of the diagram showing a graph plotting a noise level on a vertical axis and time on a horizontal axis) and a lower block of the diagram showing a diagram depicting integral intervals;

FIG. 4 is a graph plotting absolute values of logarithm of the ACF on a vertical axis and the delay time on a horizontal axis;

FIG. 5 is a graph plotting values of the normalized ACF on a vertical axis and values of the delay time on a horizontal axis;

FIG. 6 is a graph plotting values of the normalized IACF on a vertical axis and values of the delay time of the right and left ear signals on a horizontal axis;

FIG. 7 is a block diagram illustrating a method for calculating weighting coefficients;

FIG. 8 is a block diagram showing a model of the auditory-brain system;

FIG. 9 is a graph plotting scale values of loudness on a vertical axis and values of bandwidth on a horizontal was;

FIG. 10 is a graph plotting scale value of diffuseness on a left vertical axis, value of IACC on a right vertical axis and horizontal angle of reflection on a horizontal axis;

FIG. 11(a) is a graph plotting values of ASW on a vertical axis and values of IACC on a horizontal axis, and

FIG. 11(b) is a graph plotting values of ASW on a vertical axis and values of W_(IACC) on a horizontal axis;

FIG. 12 is a graph plotting scale values of ASW (actual measurements) on a vertical axis and calculated scale values of ASW on a horizontal axis;

FIG. 13 is a graph plotting values of threshold of the signal on a vertical axis and values of the delay time on a horizontal axis;

FIG. 14 is a block diagram depicting a basic configuration of the system for evaluating the sound according to the invention;

FIG. 15 is a flowchart illustrating the basic steps in a method for evaluating the sound including;

FIG. 16 is a graph plotting values of amplitude on a vertical axis and values of the delay time on a horizontal axis, which shows a wave shape of the normalized ACF extracted from captured piano sound signals; and

FIG. 17 is a graph representing a pitch (Hertz) on a vertical axis and a tone scale of twelve-degree temperament on a horizontal axis.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is an overview diagram illustrating an arrangement of an embodiment of the sound evaluating system according to the invention. As shown in FIG. 1, the system according to the invention includes a listener's dummy head 1, binaural sound recording means 2 (microphones), which is mounted on the dummy head 1, for capturing a sound from the noise source and converting the sound into acoustic signals, a LPF (low pass filter) 3, A/D converter 4 and a computer 5. An actual human head is most desirable for the sound recording, but it is inconvenient in practice. Therefore, the dummy head 1 modeled as a human head is utilized. A precisely manufactured dummy head is very expensive, thus in the present embodiment, a model of a head made of a polystyrene form and having a spherical shape (with a diameter of 20 cm) is used. Even using such a simple dummy head, it is possible to obtain the autocorrelation function and interaural crosscorrelation function which do not significantly differ from those obtained with the aid of the precisely made dummy head. The computer 5 includes storage means 6 for storing the captured acoustic signals, ACF calculating means 7 for reading the stored acoustic signals (right and left ear dual channel signals) to calculate an autocorrelation function from the readout acoustic signals, IACF calculating means 8 for calculating an interaural crosscorrelation function from the readout acoustic signals, ACF factor calculating means 9 for calculating ACF factors from the calculated ACF, IACF factor calculating means 10 for calculating IACF factors from the calculated IACF, noise source identification means 11 for identifying a kind of a noise source based on the calculated ACF factors, subjective evaluation means 12 for subjectively evaluating the ACF factors and/or IACF factors, and a database 13 concerned with data for the identification and subjective evaluation of a noise source.

Capacitor microphones (with microphone amplifiers) mounted on right and left side heads of the head model 1 are connected with sound input/output terminals (A/D converter 4) of the portable PC 5 via the LPFs. The microphones 2 are used to capture an environmental noise. Under a program loaded into the PC, measurements, computation of physical factors, an identification of a kind of a noise source and a subjective evaluation are executed. The database concerned with data for identification of a kind of a noise source and for subjective evaluation is built.

FIG. 2 is a flowchart showing the method for identifying and subjectively evaluating the noise source. As shown FIG. 2, in a step S1, a sound from the noise source is captured and is converted into acoustic signals by using the sound recording means 2. The captured acoustic signals are supplied via LPF 3 to the A/D converter 4 and are converted into digital values. In a next step S2, the acoustic signals captured in the step S1 are stored. In a step S3, the acoustic signals stored at the step S2 are read out. At a step S4, the ACF calculating means 7 and IACF calculating means 8 calculate ACFs and IACF based on the readout acoustic signals. In a step S5, the ACF factor calculating means 9 and IACF calculating means 10 calculate ACF factors and IACF factors based on the ACFs and IACF calculated in the step S4. In a step S6, the noise source identification means 11 identifies a kind of the noise source and the subjective evaluation means 12 evaluates subjectively the sound. When the identification and the evaluation are executed, necessary data is read out of the database 13 storing templates as reference data, and the calculated factors are compared with the reference data.

For a start, a number of measurement sessions of the captured acoustic signal are extracted by peak-detection process. In order to automatically extract environmental noises or target noises from a continuous noise, a monoaural energy Φ_(ll)(0) or Φ_(rr)(0), which are energies at the left or right ear entrance respectively, is continuously analyzed. FIG. 3 is a schematic diagram depicting a procedure of the peak detection, an upper block of the diagram shows a graph plotting a noise level on a vertical axis and time on a horizontal axis, and a lower block of the diagram shows a diagram depicting integral intervals. The interval for the calculation of Φ(0) can be fairly long, say 1 second, when the noise is a continuous noise such as an aircraft noise or a railway noise, but a shorter interval must be used when the noise is brief or intermittent. For the running calculation in equation (1) described below, however, it may be necessary to select an interval longer than the integration interval. Thus, this time interval must be determined according to the kinds of the noise source.

This enables Φ(0) to be determined more accurately than it can be determined when using a conventional sound level meter with a long distance, i.e. a long time constant. The peaks cannot be detected unless the trigger level L_(trig) is properly set in advance. The appropriate L_(trig) value also varies according to the kinds of the target noise, the distance between the target and the receiver, and atmospheric conditions. It must therefore be determined by means of a preliminary measurement. It is easy to determine the value of L_(trig), when the distance between the target and the receiver is short and there is no interfering noise source near the receiver. The noise centered on its maximum Φ(0) is recorded on the system as a single session. The duration of one session for each target noise (i.e. t₅) should be selected so as to include Φ(0) peak after exceeding L_(trig) value. For normal environmental noise like an aircraft noise and a railway noise, the value of t₆ is approximately 10 s. This is different from a steady-state noise with longer duration or an intermittent noise with shorter duration. It is noted that the present system cannot be used when there are interfering noises. As shown in FIG. 3, the set of sessions {S₁(t),S₂(t),S₃(t), . . . S_(N)(t); N: the number of sessions, 0<t<t_(s)} is stored on the system automatically.

The running ACF and running IACF for each session S_(N)(t) with duration t_(s) are analyzed as shown in FIG. 3. Here we consider only a single session in order to explain the process of “running”. Appropriate values for the integration interval 2T and running step t_(step) are determined before the calculation. As mentioned above, the recommended integration interval is approximately (30 Φ(0) τ_(e)(min)[ms]), where τ_(e)(min) is the minimum value of the running series of values τ_(e), and can easily be found by the preliminary measurement. This is found by use of data of different kind of environmental noises. In most cases, adjoining integration intervals overlap each other.

The ACF and the IACF are calculated every step (n=1, 2, . . . , M) within one session with the range of 2T which shifts in every t_(step), as {(0,2T),(t_(step),t_(step)+2T),(2t_(step),2t_(step)+2T), . . . ,((M−1l)t_(step),(M−1)t_(step)+2T)}. Physical factors are extracted from each step of the ACF and the IACF. The 2T must be sufficiently longer than the expected value of τ_(e). Also, it is deeply related to an “auditory time-window” for sensation of each step. A 2T between 0.1 and 0.5 s may be appropriate for an environmental noise. If 2T is less than this range, the τ_(e)(min) converges at a certain value. Generally, it is preferable that tap is about 0.1 s. If a more detailed activity of fluctuation is necessary, a shorter t_(step) should be selected. As is well known, the ACF and the IACF are obtained by using FFT (fast Fourier transformation) for the binaural signals and then using the inverse FFT. The A-weighting filter and frequency characteristics of microphones must be taken into consideration after the process of FFT.

The ACF at the left and right ears are respectively represent as Φ_(ll)(τ) or Φ_(rr)(τ). In discrete numbers, they are represented as Φ_(ll) ^((i)) and Φ_(rr) ^((i)) (1<I<Tf; f: sampling frequency [Hz]; I: integer). In the calculation of Φ(0) for left and right values, Φ_(ll) ^((i)) and Φ_(rr) ^((i)) are averaged as follows: $\begin{matrix} {{\Phi_{{ll},\quad {rr}}(0)}\quad = \quad {\frac{1}{Tf}\quad \left( {\sum\limits_{i\quad = \quad 1}^{Tf}\left( \Phi_{{ll},\quad {rr}}^{(l)} \right)^{2}} \right)^{1/2}}} & (1) \end{matrix}$

An accurate value for the SPL is given by the following equation: $\begin{matrix} \begin{matrix} {{SPL} = \quad {{10\quad \log_{10}\sqrt{{\Phi_{ll}(0)}{\Phi_{rr}(0)}}} - {10\quad \log_{10}{\Phi_{ref}(0)}}}} \\ {\approx \quad {{10\quad \log_{10}{\Phi_{ll}(0)}} - {10\quad \log_{10}{\Phi_{ref}(0)}}}} \\ {\approx \quad {{10\log_{10}{\Phi_{rr}(0)}} - {10\quad \log_{10}{\Phi_{ref}(0)}}}} \end{matrix} & (2) \end{matrix}$

where Φ_(ref)(0) is the Φ(0) at the reference sound pressure, 20 μPa. The binaural listening level is the geometric mean of Φ_(ll)(0) or Φ_(rr)(0):

Φ(0)={square root over (Φ_(ll)(0)Φ_(rr)(0))}  (3)

Since this Φ(0) is the dominator for normalization of IACF, it can be considered to be classified as one of the IACF factors: or the right hemispheric spatial factors.

The effective duration τ_(e), is defined by the delay time at which the envelope of the normalized ACF becomes 0.1 (the ten-percentile delay). The normalized ACF for the left and right ears, Φ_(ll,rr)(τ), is obtained as $\begin{matrix} {{{\varphi_{{ll},{rr}}(\tau)} = \frac{\Phi_{{ll},{rr}}(\tau)}{\Phi_{{ll},{rr}}(0)}}\quad} & (4) \end{matrix}$

FIG. 4 is a graph plotting absolute values of logarithm of the ACF on a vertical axis and the delay time on a horizontal axis. As shown in FIG. 4, it is easy to obtain τ_(c) if the vertical axis transforms into decibel (logarithmic) scale, because the linear decay for initial ACF is usually observed. For the linear regression, the least means square's (LMS) method for ACF peaks which are obtained within each constant short time range Δτ is used. The Δτ is used for the detection of peaks the ACF and must be carefully determined before the calculation. In calculating τ_(c), the origin of the ACF (=0, at τ=0) is sometimes excluded if the origin is not in the regression line. As an extreme example, if the target noise consists of a pure tone and a white noise, rapid attenuation at origin due to the white-noise components is observed. And the subsequent decay is kept flat because of the pure-tone component. In such a case, the solution of the ACF function is not found.

FIG. 5 is a graph plotting values of the normalized ACF on a vertical axis and values of the delay time on a horizontal axis. As shown in FIG. 5, τ₁ and φ₁ are respectively the delay time and amplitude of the first peak of the normalized ACF. The first maximum must be determined as a main peak avoiding local minor peaks. The factors τ_(n) and φ_(n) (n≧2) are excluded because they are usually related to τ₁ and φ₁.

FIG. 6 is a graph plotting values of the normalized IACF on a vertical axis and values of the delay time of the left and right ear signals on a horizontal axis. The IACF between sound signals at left and right ears is represented as Φ_(lr)(τ) (−1<τ+1 [ms]). In the digital forms, it is represent as Φ_(lr)(τ)(−ƒ/10³≦i≦ƒ/10³; i: integer, where negative values signify the IACF as the left channel delayed). Thus, it is enough to consider only the range from −1 to +1 ms, which is the maximum possible delay between the ears. The maximum amplitude i.e. the IACC is a factor related to the subjective diffuseness. As shown in FIG. 6, it is obtained as the maximum amplitude of the normalized IACF φ_(lr) ^((i)) within the delay range. Thus,

IACC={φ _(lr) ^((i))}_(max)  (5)

The normalized IACF is given by $\begin{matrix} {\varphi_{lr}^{(i)} = \frac{\Phi_{lr}^{(i)}}{\Phi (0)}} & (6) \end{matrix}$

The value of τ_(IACC) is simply obtained at the time delay of the maximum amplitude. For example, τ_(IACC) is greater than zero (positive), the sound source is on the right side of the receiver or perceived as if it were. As shown in FIG. 6, the value of W_(IACC) is given by the width of the peak at the level 0.1 (IACC) below the maximum value. The coefficient 0.1 is approximately used as JND at IACC=1.0. The listening level LL is obtained by the manner represented in equation (2) replacing SPL with LL. Thus, each of physical factors can be obtained based on the ACF and IACF.

Next, the method for identifying the kinds of noise sources based on the ACF factors will be described.

The kinds of the noise sources are identified using four ACF factors: energy Φ(0) represented at the origin of the delay (i.e. delay time is zero); effective duration τ_(e); delay time of the first (maximum) peak τ₁; amplitude of first (maximum) peak of the normalized ACF φ₁. Since the Φ(0) varies according to the distance between the source and the receive, spatial attention is paid to the conditions for calculation if the distance is unknown. Even if the factor Φ(0) is not useful, the noise source can be identified by using the other three factors. Remaining IACF factors may be taken into account if the spatial information is changed. One of the reason for using τ_(e) (i.e. (τ_(e))_(min)), which represents the most active part of the noise signal, is that the piece is most deeply associated with subjective responses. The distance between values of each factor at (τ_(c))_(min) for the unknown target data (indicated by the symbol a in equations (7-10) below), and values for the template (indicated by the symbol b) ate calculated. Here, “target” is used as an environmental noise as an object to be identified by the present system. Template values of are a set of typical ACF factors for specific environmental noise, and these templates for comparison with an unknown noise.

The distance D(x) (x: Φ(0), τ_(e), τ₁, and φ₁) is calculated in the following manner.

D(Φ(0))=|log(Φ(0))^(a) −log(Φ(0))^(b)|  (7)

D(τ_(e))=|log(τ_(e))_(min) ^(a) −log(τ_(e))_(min) ^(b)|  (8)

D(τ₁)=|log(τ₁)^(a) −log(τ₁)^(b)|  (9)

D(φ₁)=|log(φ₁)^(a) −log(φ₁)^(b)|  (10)

The total distance D of the target can be represented as following equation.

D=W ^(Φ(0)) D(Φ(0))+W ^(τ) ^(_(s)) D(τ_(e))+W ^(τ) ^(₁) D(τ₁)+W ^(Φ) ^(₁) D(Φ₁)  (11)

where W^((x)) (x;Φ(0), (τ_(e))_(min), τ₁, Φ₁) signifies the weighting coefficient. The template with the nearest D can be taken as the identified noise source. Thereby the unknown noise source can be identified what it is (such as railway noise, automobile noise, and factory noise), and can be identified its type such as type of cars or type of machines.

FIG. 7 is a block diagram illustrating a method for calculating weighting coefficients. Weighting coefficients W^((x)) (x;Φ(0), (τ_(e))_(min), τ₁, Φ₁) in equation (11) are obtained in the following manner by use of statistical values, s₁ and s₂. As shown in FIG. 7, the value of s₁ is obtained as an arithmetic mean of a standard deviation through all categories for each ACF factor after calculating standard deviation within the same category representing the same kind of noise. The value of s₂ is obtained as a standard deviation for all categories after obtaining arithmetic means for respective factors within the same category. Weighting coefficients W^((x)) are given as {(s₂/s₁)^(½))} after normalization by maximum values among factors, {(s₂/s₁)^(½))}_(min). Process of the square root is obtained expedentially. As a factor with larger SD between noise sources and with smaller SD among a certain source can be distinct the different kinds of noise, weighting of such a factor should be larger than the other factors. If the learning function toward to the improvement of the template is given, a template may be overwritten in order by average values of each ACF factor between latest values and previous values in the system.

FIG. 8 is a block diagram showing a model of the auditory-brain system. The model of the auditory-brain system includes an ACF mechanism, an IACF mechanism and a specialization of the cerebral hemisphere. It is noteworthy that information, which is included in power spectrum of a signal, is also included in an ACF of a sound signal. In order to describe spatial sensation to the noise field, spatial factors extracted from IACFs are taken into account. Timbre is defined as an overall sensation including the primitive sensations and spatial sensations.

Using the auditory-brain model (FIG. 1), we now consider primary sensations of a given sound signal p(t) located in front of a listener in a free field. The long-term ACF is defined by $\begin{matrix} {{\Phi_{p}(\tau)} = {\lim_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{+ T}{{p^{\prime}(t)}{p^{\prime}\left( {t + \tau} \right)}\quad {t}}}}}} & (12) \end{matrix}$

where the p′(t)=p(t)Φ(0) s(t), s(t) being the ear sensitivity. For sake of convenience, s(t) may be chosen as the impulse response of an A-weighted network. The power density spectrum may also be obtained from ACF defined by $\begin{matrix} {{{p_{d}(\omega)} = {\int_{- \infty}^{+ \infty}{{\Phi_{p}(\tau)}^{- {j\omega\tau}}\quad {t}}}}{{And},}} & (13) \\ {{\Phi_{p}(\tau)} = {\int_{- \infty}^{+ \infty}{{p_{d}(\omega)}^{{- {j\omega}}\quad t}\quad {t}}}} & (14) \end{matrix}$

Thus the ACF and power density spectrum mathematically contain the same information.

In the ACF analysis, there are three significant items:

(1) Energy represented as the origin of the delay, Φ_(p)(0);

(2) Effective duration of me envelope of the normalized ACF, τ_(e); and

(3) Fine structure, including peaks, dips and delays.

As shown FIG. 4, τ_(c) is defined by the ten-percentile delay and which represents a repetitive feature or reverberation containing the noise source itself. As described above the normalized ACF is defined by φ_(p)(τ)=Φ_(p)(τ)/101 _(p)(0).

Loudness S_(L) is given by following equation.

S _(L) =f _(L)(Φ₀(0), τ₁, φ₁, τ_(e))  (15)

In short, loudness can be obtained by ACF factors: energy Φ(0) represented at the origin of the delay (i.e. delay time is zero); effective duration τ_(e); delay time of the maximum peak τ₁; amplitude of maximum peak of the normalized ACF φ₁. Where the value of τ₁ corresponds to pitch of the noise and/or the missing fundamental as discussed below. When p′(t) is measured with reference to the pressure 20 μPa leading to the level L(t), the equivalent sound pressure level L_(eq) can be obtained as follows. $\begin{matrix} {L_{eq} = {10\quad \log \frac{1}{T}{\int_{0}^{T}{10\frac{L(t)}{10}\quad {t}}}}} & (16) \end{matrix}$

This L_(eq) corresponds to 10logΦ_(p)(0). Since the sampling frequency of the sound wave must be more then the twice of the maximum audio frequency, this value can be far more accurate than the L_(eq) which is measured by the conventional sound level meter.

FIG. 9 is a graph plotting scale values of loudness on a vertical axis and values of bandwidth on a horizontal axis. This graph shows that scale values of loudness within the critical band that were obtained in paired-comparison tests (with filters with the slope of 1080 dB/Octave) under the conditions of a constant Φ_(p)(0). Obviously, when a noise has the similar repetitive feature, τ_(c) becomes a greater value, as like a pure tone, than the greater loudness results. Thus a plot of loudness versus bandwidth is not flat in the critical band. In this connection, this result is obtained in the frequency range centered on 1 kHz.

Pitch or missing fundamental of the noise is defined by

S _(p) =f _(p)(τ₁, φ₁)  (17)

Missing fundamental phenomenon is that if there are several of harmonic tones, listener perceives pitch of the sound which is not exist in fact.

Timbre, which is most complicated sensation and includes loudness and pitch, is defined by

S _(T) =f _(T)[Φ(0), τ_(e), (τ₁, φ_(e)),(τ′_(d), φ′₁) . . . , (τ′_(n)′, φ′_(n))]  (18)

Since (τ₁, φ₁) are the most significant orthogonal factors in (τ_(n), φ_(n))(n=1, 2, . . . ), thus, equation (11) can be rewritten as

S _(T) =f _(T)[Φ(0), τ_(e), τ₁, φ₁)]  (19)

Perception of the signal duration is given by

S _(D) =f _(D)[Φ(0), τ_(e), τ₁, φ₁)]  (20)

Long-term IACF is given by $\begin{matrix} {{\Phi_{lr}(\tau)}\quad = \quad {\lim_{T\quad\rightarrow\quad \infty}\quad {\frac{1}{2\quad T}\quad {\int_{- T}^{+ T}{{p_{l}^{\prime}(t)}\quad {p_{r}^{\prime}\left( \quad {t\quad + \quad \tau} \right)}\quad {t}}}}}} & (21) \end{matrix}$

where p′_(l,r)(t)=p(t)_(l,r)Φ(0) s(t), p(t)_(l,r) being the sound pressure at the left- and right-ear entrance.

Spatial factor, which includes perceived directions of a noise source, in the horizontal plane, is given by

S=f(LL, IACC, τ _(IACC) , W _(IACC))  (22)

Where LL={Φ_(ll)(0), Φ_(rr)(0)}. The symbol {} signifies a set, Φ_(ll)(0) and Φ_(rr)(0) being ACFs at τ=0 (sound energy), of the signals arriving at the left and right ear-entrances. Mathematically, LL is expressed by the geometrical mean of the energies of sound signals arriving at the both ear entrance.

That is,

LL=10log ₁₀{square root over (Φ_(ll)(0)Φ_(rr)(0))}[dB]  (23)

In these four orthogonal factors in equation (22), the τ_(IACC) (within the range from −1 to +1 ms) is a significant factor with respect to the perceived directions of a noise source in the horizontal plane. A well-defined direction is perceived when the normalized interaural crosscorrelation function has one sharp maximum, a high value of the IACC and narrow value of the W_(IACC), due to high frequency components. On the other hand, subjective diffuseness or ambiguous spatial direction impression is perceived when IACC indicates low value (IACC<0.15).

For the perception of a noise source located in the medium plane, the temporal factors extracted from the long-term ACF of sound signal arriving at the ear-entrances should be added into equation (22).

As shown in FIG. 8, a remarkable finding is that there are neural activities at the inferior colliculus corresponding to the IACC and sound energies for sound signals that arriving at the two-ear entrances. Also, it is discovered that LL and IACC are dominantly associated with the right cerebral hemisphere, and the temporal factors Δt₁ and T_(sub) are associated with the left cerebral hemisphere.

In order to obtain scale values of subjective diffuseness, paired-comparison tests using white noise with varying the horizontal angle of two symmetric reflections have been conducted. Listeners judged which of two sound fields were perceived as more diffuse, under the constant conditions of LL, τ_(IACC), W_(IACC). FIG. 10 is a graph plotting scale value of subjective diffuseness on a left vertical axis, value of IACC on a right vertical axis and horizontal angle of reflection on a horizontal axis. As shown in FIG. 10, the strong negative correlation between the scale value and the maximum amplitude i.e. IACC can be found in the results with frequency band between 250 Hz-4 kHz (in FIG. 10, (a): 250 Hz, (b) 500 Hz, (c): 1 kHz, (d): 2 kHz, (e); 4 kHz). The scale value of subjective diffuseness may be well formulated in terms of the ⅔ power of the IACC, and is defined as follows.

S _(diffuseness)=−α(IACC)^(β)  (24)

where coefficients α=2.9 and β=⅔, which are obtained experimentally.

A method for finding apparent source width (ASW) of noise field will be described. For a noise field with a predominately low frequency range, the long-term IACF has no sharp peaks for the delay range of −1<τ<+1 ms, and W_(IACC) becomes wider. W_(IACC) is obtained by following equation. $\begin{matrix} {W_{I\quad \Lambda \quad {CC}}^{(\delta)} = {\frac{4}{\Delta \quad \omega_{c}}{{\cos^{- 1}\left( {1 - \frac{\delta}{IACC}} \right)}\lbrack s\rbrack}}} & (25) \end{matrix}$

where ΔΩ_(c)=2π(f₁+f₂), and f₁ and f₂ are lower and upper frequency of an ideal filter, respectively. For sake of simplicity, δ is defined by 0.1 (IACC).

Of particular interest is that a wider ASW nay be perceived with low frequency bands and by decreasing the IACC. More clearly, the ASW may be obtained by both factors, IACC and W_(IACC), under the conditions of a constant LL and τ_(IACC)=0. The scale values of ASW were obtained by paired-comparison test with ten subjects as listeners. In order to control the values of W_(IACC), the center frequencies were varied within a range of 250 kHz to 2 kHz. The values of IACC were adjusted by controlling the sound pressure ratio of the reflections to the level of the direct sound. The listening level affects ASW, therefore, the total sound pressure levels at the ear entrances of all noise fields were kept constant at a peak of 75 dBA. The listeners judged which of two noise sources they perceived to be wider. The results of the analysis of variance for the scale values S_(ASW) indicate that both of factors IACC and W_(IACC) are significant (p<0.0), and contribute to S_(ASW) independently. Thus, S_(ASW) can be represented by following equation.

S _(ASW) =a(IACC)^({fraction (3/2)}) +b(W _(IACC))^(½)  (26)

where coefficients a=−1.64, b=2.44 are obtained by regressions of the scale values with ten subjects as shown in FIGS. 11(a) and 10(b), respectively.

FIG. 11(a) is a graph plotting values of ASW on a vertical axis and values of IACC on a horizontal axis, and FIG. 11(b) is a graph plotting values of ASW on a vertical axis and values of W_(IACC) on a horizontal axis. FIG. 12 is a graph plotting scale values of ASW (actual measurements) on a vertical axis and calculated scale values of ASW on a horizontal axis. It is found that the calculated scale values S_(ASW) by equation (26) and measured scale values are in good agreement (r=0.97, p<0.1).

To assess fluctuating-environmental noise on time axis, we use the running short-time ACF as well as the running short-time IACF. Running short-time spatial and temporal factors extracted as a similar manner to above are used to describe the primitive sensations of a fluctuating-noise field. The short-time ACF is defined by $\begin{matrix} {{\Phi_{P}(\tau)} = {\frac{1}{2T}{\int_{- T}^{+ T}{{p^{\prime}(t)}{p^{\prime}\left( {t + \tau} \right)}\quad {t}}}}} & (27) \end{matrix}$

where 2T is determined by the signal duration to be analyzed. This 2T should be selected covering at least the minimum value of effective duration, (τ_(e))_(min) contains the most rapid movement in the signal, thus this particular influence s most greatly the subjective responses.

For the loudness S_(L) of each noise piece, equation (15) can be replaced by

S _(L) =f _(L)(LL, τ ₁, φ₁, τ_(e))  (28)

Here each factor is obtained for each noise piece, and the Φ(0) in equation (15) has been replaced by LL. It is worth noticing that the temporal factors extracted from the ACF must be influenced by repetitive reflections (Δt₁, Δt₂, . . . ) and the subsequent reverberation time (T_(sub)) in a room.

In describing the pitch of the environmental noise field, the significant temporal factors of the noise field are τ₁ and φ₁, thus equation (17) holds.

The timbre of the environmental noise field may be expressed by all of the temporal and spatial factors, so that

S _(L) =f _(T)(τ_(e), τ₁, φ₁, τ′₁, φ′_(n) , LL, IACC, τ _(IACC) , W _(IACC))  (29)

Considering the fact that human cerebral hemisphere are specialized in such way that temporal factors are associated with the left hemisphere and spatial factors are associated with the right hemisphere, we can rewrite equation (29) as

S _(L) =f _(T)(τ_(e), τ₁, φ₁)_(left)+f_(T)(LL, IACC, τ _(IACC) , W _(IACC))_(right)

Threshold of the weak reflection is shown in FIG. 3 as a function of Δt₁. The spatial direction of reflection to the listener (IACC and τ_(IACC)) and the delay time of reflection Δt₁, as included equation (29), express this threshold.

The ear sensitivity may be characterized by the physical system including the external ear and the middle ear. Before analyzing the sound signal, we can use an A-weighting network for the sake of convenience.

The intelligibility of single syllables as a function of the delay time of single reflection can be calculated by the four orthogonal factors extracted from the short-term ACF analyzed for the piece between consonant and vowel sounds. A recent investigation clearly shows that timbre or dissimilarity judgment is a overall subjective response similar for the subjective preference of sound fields in a concert hall. The subjective preference as well as timbre can be described by the use of the minimum value of τ_(e). A short-term-integration time is given by

(2T)=30(τ_(e))_(min)  (30)

Effects of noise on mental tasks can be interpreted as an interference phenomenon between the task performance and the hemispheric dominance. Temporal factors extracted from the ACF are associated with the left cerebral hemisphere and spatial factors extracted from IACF are much concerned with the right.

FIG. 14 is a block diagram depicting a basic configuration of the sound evaluating system according to the invention. The system for evaluating the sound is basically same as the system shown in FIG. 1. However, some of the components of the computer in this sound evaluation system (FIG. 14) are different from those of the system shown in FIG. 1. As shown in FIG. 14, the present sound evaluation system includes binaural sound recording means 20 (left and right dual channels), ACF calculating means 21 for calculating an autocorrelation function from the recorded sound signals, ACF factor calculating means 22 for calculating autocorrelation function factors from the calculated autocorrelation function, IACF calculating means 25 for calculating interaural crosscorrelation function from the recorded sound signals, IACF factor calculating means 26 for calculating interaural crosscorrelation function factors from the calculated IACF, sound evaluation means 24 for evaluating the sound based on the calculated ACF factors, sound evaluation means 27 for evaluating the sound based on the calculated ACF factors and/or IACF factors, and data base 23 storing a wide variety of data.

FIG. 15 is a flowchart illustrating the basic steps of the sound evaluating method according to the invention. A shown in FIG. 15, in a step P1, a sound from a sound source is captured by microphones (not shown) into sound signals. It should be noted that line-in digital or analogue signals may be used instead of the acoustic signals captured by the microphones. The recorded sound signals are supplied to an A/D converter (not shown) via LPF 3 (not shown) and are converted into digital values. In a step P2, a computer i.e. CPU calculates ACF and IACF by passing the digital sounds signals. In a step P3, the CPU calculates ACF factors and IACF factors from the calculated ACF and IACF. There is previously established a database storing the following data: tone data associating tones with ACF factors, prosodic data associating prosodic elements with ACF factors, evaluation data associating subjective evaluation values with ACF factors. Next, association between respective data in the database and ACF or IACF factors will be described. For instance, a certain instrument is played, sounds generated by the instrument are captured by the present system and are converted into sound signals, and ACF factors and IACF factors are calculated. For respective sounds, various kinds of data (such as tone color, timbre and subjective evaluation) are obtained by calculating ACF factors and IACF factors or by another known calculation method. Then, the ACF factors and IACF factors and corresponding data for respective sounds are stored in the database in a mutually associated manner. In a step P4, CPU evaluates the sounds based on the calculated ACF factors and IACF factors. Upon performing this evaluation, the target data is read out from the database to effect the comparison and consideration.

FIG. 16 is a graph showing a normalized ACF of acoustic signals captured from a piano, an amplitude being plotted on a vertical axis and values of the delay time being plotted on a horizontal axis. FIGS. 16(a), 16(b) and 16(c) show wave shapes of the normalized ACF extract from captured piano sound signals which sounded in frequencies at A1 (55 Hz), A2 (220 Hz) and A6 (1760 Hz), respectively. In this figures, it can be defined that maximum peak value is φ₁, its delay time is τ₁, peak values within the delay time is φ′_(n), its delay time τ′_(n). Here, τ₁ and φ₁, which are physical factors, correspond to pitch and strength of the pitch, respectively. As shown in FIG. 16, an inverse number of the τ₁ corresponds to the pitch frequency. For example, in FIG. 16(a), the piano sound source sounds at a frequency of 55 Hz, ACF factors are calculated from the piano sound signals. τ₁, one of the ACF factors is approximately 18 ms, an inverse number of τ₁ is 55. Thus it is found that the inverse number of τ₁ correlates (i.e. corresponds) to the pitch frequency of the piano sound source. This correlation will be explained much more clearly with reference to FIG. 17.

FIG. 17 is a graph plotting values of pitch on a vertical axis and values of the tonal scale in equal temperament of 12 degrees on a horizontal axis. Squares in the figure are values of pitch of the scale in equal temperament of 12 degrees, which are calculated using predetermined formula. Black circles in figures are values of pitch calculated based on the above mentioned τ₁. As shown in FIG. 17, values of pitch in equal temperament of 12 degrees are coincide with values of pitch extracted from τ₁. The pitch calculated from τ₁ at A6 is run off, it seems to be miss tune. In such a way, ACF factor, τ₁, can be utilized to tune various instruments other than a piano but also. In addition, in this system, the database stores therein correlation between various ACF factors (energy *represented at the origin of the delay (i.e. delay time is zero); an effective duration τ_(c); a delay time of a first peak τ₁; and an amplitude of a first peak of a normalized ACF φ₁, information of respective peaks within the delay time (from zero to τ₁) extracted from ACFs, τ_(n)′, φ_(n)′ (n=1, 2, 3, . . . , N (N is an integer less than approximately 10)), and IACF factors extracted from IACF and the subjective evaluation of the real instruments. The present system can be used to compare data stored in the database and the factors extracted from the sound signals generated by instrument sound source. Thus the present system can be utilized as a support system to research excellent sounds (having good timbre or good subjective evaluation).

Prior to the tuning with the aid of the present system, tonal scale data, which includes various prosodic (such as just or equal temperament) is stored in the database. It is preferable that user can select a desired prosodic according to the his or her own object. Also, it is preferable to set equal temperament as a default. Reference tone A4 is set at 440 Hz as a default, and the value may be changed by 1 Hz step. In addition, reference tone may be generated in pure tone. In such a way, it is preferable that the reference frequency and kinds of tonal scale can be changed. If equal tone is selected, the nearest musical scale (such as A or C#) is presented. Alternatively, when desired data (i.e. tonal scale) is selected from the database, a numeric value indicating a difference between desired data and a sound generated by an instrument may be presented. For example, if a desired tonal scale is A2 (i.e. aimed tuning to A2), the database is referenced, then pitch frequency associated with A2 is read out of the database, the readout data is compared with the pitch frequency obtained by the τ₁ extracted from the sounds to indicate the numeric value presenting a difference between them. In this connection, the difference may be presented as not only a difference in numeric value but also a difference in tonal scale. In addition, a value of a difference between a value of tonal scale of the sounds and a desired tonal scale or the nearest tonal scale can be presented by this system, and an indicator showing which of tones (value of the generated sound and value of the database) is higher or lower or they are just same, can be equipped with in this system.

Thus, the sound evaluating system according to the present invention can be utilized to support the manufacturing of instruments generating excellent sounds in such a way that ACF factors and/or IACF factors calculated from the sound generated by the instrument are compared with corresponding data previously stored in the database. In particular, according to the present invention, it makes it possible to know spatial impression of the sound field. Also, the system according to the present invention may be utilized not only for craftsman but also for researchers, singers, players, and instruments can be tuned toward theirs desired performances. The tuning and sound evaluation according to the invention can be utilized for various instruments such as acoustic musical instruments (such as keyboards, strings and wind instruments) and electronic musical instruments.

In consideration of the various embodiments of the systems or methods according to the invention, the principle of the invention can widely be applied Further, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, not to be used to interpret the scope of the invention. Various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description. In this specification, although the pitch, which is one of the evaluation value, is described in detail, it should be understood that other various evaluation such as timbre or tone color may be evaluated according to the present invention. 

What is claimed is:
 1. A method for evaluating a sound, comprising the steps of: capturing the sound and converting the captured sound into an acoustic signal; calculating an autocorrelation function ACF from the acoustic signal with the aid of computing means; calculating autocorrelation function factors from the calculated autocorrelation function ACF with aid of the computing means; and evaluating the sound based on the ACF factors and a preestablished database with the aid of the computing means, said database storing at least one of the following data; tone data associating tones with ACF factors, prosodic data associating prosodic elements with ACF factors and subjective evaluation data associating subjective evaluation values with ACF factors.
 2. The method according to claim 1, wherein said ACF factor calculating step comprises calculating at least one of following ACF factors: energy Φ(0) represented at the origin of delay represented by a delay time of zero; effective duration τ_(c); delay time of a maximum peak τ₁; an amplitude of a maximum peak of a normalized ACF φ₁; and information of respective peaks within the delay time (from zero to τ₁)τ_(n)′, φ_(n)′ (n=1, 2, 3, . . . , N (N is an integer less than approximately 10)).
 3. The method according to claim 2, wherein the method further comprises the steps of: calculating a pitch frequency from the delay time of the maximum peak τ₁ of the ACF; and performing a timing by comparing the calculated pitch frequency with data of a predetermined tonal scale stored in the database to derive a difference therebetween.
 4. The method according to claim 1, wherein the method further comprises the steps of: capturing the sound in a binaural manner and converting the captured sound into acoustic binaural signals; calculating an interaural crosscorrelation function IACF between the acoustic signals at left and right ears from the acoustic binaural signals with the aid of the computing means; calculating interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF with the aid of the computing means; and evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said preestablished database with the aid the computing means.
 5. The method according to claim 2, wherein the method further comprises the steps of: capturing the sound in a binaural manner and converting the captured sound into acoustic binaural signals; calculating an interaural crosscorrelation function IACF between the acoustic signals at left and right ears from the acoustic binaural signals with the aid of the computing means; calculating interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF with the aid of the computing means; and evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said preestablished database with the aid the computing means.
 6. A system for evaluating a sound comprising: sound capturing means for capturing the sound and converting the captured sound into an acoustic signal; ACF calculating means for calculating an autocorrelation function ACF from the acoustic sign; ACF factor calculating means for calculating autocorrelation function factors from the calculated autocorrelation function ACF; and evaluating means for evaluating the sound based on the ACF factors and a preestablished database, said database storing at least one of the following data; tone data associating tones with ACF factors, prosodic data associating prosodic elements with ACF factors and subjective evaluation data associating subjective evaluation values with ACF facts.
 7. The system according to claim 6, wherein said ACE factor calculating means comprises calculating means for calculating at least one of the following ACF factors: energy Φ(0) represented at the origin of delay defined by a delay time of zero; an effective duration τ_(e); a delay time of a maximum peak τ₁; an amplitude of maximum peak of the normalized ACF φ₁; and information of respective peaks within the delay time (from zero to τ₁)τ_(n)′, φ_(n)′ (n−1, 2, 3, . . . , N (N is an integer less than approximately 10)).
 8. The system according to claim 7, wherein the system further comprises: calculating means for calculating a pitch frequency from the delay time τ₁ of the ACF; and tuning means for comparing the calculated pitch frequency with data of a predetermined tonal scale database to represent a difference between them.
 9. The system according to claim 6, further comprising: capturing means for capturing the sound in a binaural manner and converting the captured sound into acoustic binaural signals; IACF calculating means for calculating an interaural crosscorrelation function IACF between the acoustic signals at left and right ears from the acoustic binaural signals; IACF factor calculating means for calculating interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF and evaluating means for evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said predetermined database.
 10. The system according to claim 7, wherein the system further comprises: capturing means for capturing the sound in a binaural manner and converting the captured sound into acoustic binaural signals; IACF calculating means for calculating an interaural crosscorrelation function IACF between the acoustic signals at left and right ears from the acoustic binaural signal; IACF factor calculating mans for calculating interaural crosscorrelation function factors from the calculated interaural crosscorrelation function IACF; and evaluating means for evaluating the sound or evaluating subjectively the sound based on the IACF factors and/or the ACF factors and the said predetermined database. 